Robust parsing of natural language descriptions expressed in telegraphic style
نویسندگان
چکیده
Sublanguages represent an important application area for NLU (Grishman and Kittredge,1986). Their syntactic simplicity and reduced semantic variability provide clear computational advantages. In the present paper we consider a sublanguage currently used for official publication of business activities which is characterized by a telegraphic style typical of commercial ads. Morphological and syntactic ill-formedness is very frequent within this sublanguage, hence a robust parser is a must. The corpus we have considered was extracted from the on-line archives of the Italian Chambers of Commerce, and contains about 4 million descriptions of economic activities. They represent an important source of information about the structure of the Italian economy. Since our main goal is intelligent information retrieval, only a part of the information contained in the sentences is considered relevant. Basically, the kind of information we are interested in involves nouns, preposi t ions and noun modifiers, and involves verbs only in their nominalized or infinitive form. The peculiarity of the parsing approach described in the paper consists in the fact that we limit the syntactic analysis to the elementary relationships occurring among these elements, discarding whatever is not recognized by the morphological analyzer and giving up the attempt to reconstruct the syntactic tree of the whole sentence.
منابع مشابه
Statistical Parsing of Messages
Message Processing The recent trend in natural language processing research has been to develop systems that deal with text concerning small, well defined domains. One practical application for such systems is to process messages pertaining to some very specific task or activity [5]. The advantage of dealing with such domains is twofold firstly, due to the narrowness of the domain, it is possib...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملBuilding a Dependency-Based Grammar for Parsing Informal Mathematical Discourse
Discourse in formal domains, such as mathematics, is characterized by a mixture of telegraphic natural language and embedded formal expressions. Little is known about the suitability of input analysis methods for mathematical discourse in a dialog setting, due to the lack of empirical data. In this paper, we report on the development of a dependency-based lexicalist grammar for parsing input in...
متن کاملSemScribe: Natural Language Generation for Medical Reports
Natural language generation in the medical domain is heavily influenced by domain knowledge and genre-specific text characteristics. We present SemScribe, an implemented natural language generation system that produces doctor’s letters, in particular descriptions of cardiological findings. Texts in this domain are characterized by a high density of information and a relatively telegraphic style...
متن کاملModel Theoretic Syntax and Parsing: An Application to Temporal Logic
In general, model-theoretic approaches to syntax aim to describe the syntactic structures of natural languages by means of logical formulae so that the structures are models of these formulae. However, in practice, the use of such descriptions for processing issues involves so much complexity, that they appear to be more or less useless for such tasks. We present a simple formalism, which is ba...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1992